3 research outputs found

    Π’Π΅ΠΊΡ‚ΠΎΡ€Π½ΠΎΠ΅ прСдставлСниС слов с сСмантичСскими ΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΡΠΌΠΈ: ΡΠΊΡΠΏΠ΅Ρ€ΠΈΠΌΠ΅Π½Ρ‚Π°Π»ΡŒΠ½Ρ‹Π΅ наблюдСния

    Get PDF
    The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted as similar words. It allows to establish semantic relations (synonymy, relations of hypernymy and hyponymy and other semantic relations) by applying an automatic extraction. The extraction of semantic relations by hand is considered as a time-consuming and biased task, requiring a large amount of time and some help of experts. Unfortunately, the word2vec model provides an associative list of words which does not consist of relative words only. In this paper, we show some additional criteria that may be applicable to solve this problem. Observations and experiments with well-known characteristics, such as word frequency, a position in an associative list, might be useful for improving results for the task of extraction of semantic relations for the Russian language by using word embedding. In the experiments, the word2vec model trained on the Flibusta and pairs from Wiktionary are used as examples with semantic relationships. Semantically related words are applicable to thesauri, ontologies and intelligent systems for natural language processing.Π’ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎΡΡ‚ΡŒ ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ†ΠΈΠΈ сСмантичСской близости ΠΌΠ΅ΠΆΠ΄Ρƒ словами сдСлала модСль word2vec ΡˆΠΈΡ€ΠΎΠΊΠΎ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌΠΎΠΉ Π² NLP-Π·Π°Π΄Π°Ρ‡Π°Ρ…. ИдСя word2vec основана Π½Π° контСкстной близости слов. КаТдоС слово ΠΌΠΎΠΆΠ΅Ρ‚ Π±Ρ‹Ρ‚ΡŒ прСдставлСно Π² Π²ΠΈΠ΄Π΅ Π²Π΅ΠΊΡ‚ΠΎΡ€Π°, Π±Π»ΠΈΠ·ΠΊΠΈΠ΅ ΠΊΠΎΠΎΡ€Π΄ΠΈΠ½Π°Ρ‚Ρ‹ Π²Π΅ΠΊΡ‚ΠΎΡ€ΠΎΠ² ΠΌΠΎΠ³ΡƒΡ‚ Π±Ρ‹Ρ‚ΡŒ ΠΈΠ½Ρ‚Π΅Ρ€ΠΏΡ€Π΅Ρ‚ΠΈΡ€ΠΎΠ²Π°Π½Ρ‹ ΠΊΠ°ΠΊ Π±Π»ΠΈΠ·ΠΊΠΈΠ΅ ΠΏΠΎ смыслу слова. Π’Π°ΠΊΠΈΠΌ ΠΎΠ±Ρ€Π°Π·ΠΎΠΌ, ΠΈΠ·Π²Π»Π΅Ρ‡Π΅Π½ΠΈΠ΅ сСмантичСских ΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΠΉ (ΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΠ΅ синонимии, Ρ€ΠΎΠ΄ΠΎ-Π²ΠΈΠ΄ΠΎΠ²Ρ‹Π΅ ΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΡ ΠΈ Π΄Ρ€ΡƒΠ³ΠΈΠ΅) ΠΌΠΎΠΆΠ΅Ρ‚ Π±Ρ‹Ρ‚ΡŒ Π°Π²Ρ‚ΠΎΠΌΠ°Ρ‚ΠΈΠ·ΠΈΡ€ΠΎΠ²Π°Π½ΠΎ. УстановлСниС сСмантичСских ΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΠΉ Π²Ρ€ΡƒΡ‡Π½ΡƒΡŽ считаСтся Ρ‚Ρ€ΡƒΠ΄ΠΎΠ΅ΠΌΠΊΠΎΠΉ ΠΈ Π½Π΅ΠΎΠ±ΡŠΠ΅ΠΊΡ‚ΠΈΠ²Π½ΠΎΠΉ Π·Π°Π΄Π°Ρ‡Π΅ΠΉ, Ρ‚Ρ€Π΅Π±ΡƒΡŽΡ‰Π΅ΠΉ большого количСства Π²Ρ€Π΅ΠΌΠ΅Π½ΠΈ ΠΈ привлСчСния экспСртов. Но срСди ассоциативных слов, сформированных с использованиСм ΠΌΠΎΠ΄Π΅Π»ΠΈ word2vec, Π²ΡΡ‚Ρ€Π΅Ρ‡Π°ΡŽΡ‚ΡΡ слова, Π½Π΅ ΠΏΡ€Π΅Π΄ΡΡ‚Π°Π²Π»ΡΡŽΡ‰ΠΈΠ΅ Π½ΠΈΠΊΠ°ΠΊΠΈΡ… ΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΠΉ с Π³Π»Π°Π²Π½Ρ‹ΠΌ словом, для ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ³ΠΎ Π±Ρ‹Π» прСдставлСн ассоциативный ряд. Π’ Ρ€Π°Π±ΠΎΡ‚Π΅ Ρ€Π°ΡΡΠΌΠ°Ρ‚Ρ€ΠΈΠ²Π°ΡŽΡ‚ΡΡ Π΄ΠΎΠΏΠΎΠ»Π½ΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹Π΅ ΠΊΡ€ΠΈΡ‚Π΅Ρ€ΠΈΠΈ, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ ΠΌΠΎΠ³ΡƒΡ‚ Π±Ρ‹Ρ‚ΡŒ ΠΏΡ€ΠΈΠΌΠ΅Π½ΠΈΠΌΡ‹ для Ρ€Π΅ΡˆΠ΅Π½ΠΈΡ Π΄Π°Π½Π½ΠΎΠΉ ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΡ‹. НаблюдСния ΠΈ ΠΏΡ€ΠΎΠ²Π΅Π΄Π΅Π½Π½Ρ‹Π΅ экспСримСнты с общСизвСстными характСристиками, Ρ‚Π°ΠΊΠΈΠΌΠΈ ΠΊΠ°ΠΊ частота слов, позиция Π² ассоциативном ряду, ΠΌΠΎΠ³ΡƒΡ‚ Π±Ρ‹Ρ‚ΡŒ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Π½Ρ‹ для ΡƒΠ»ΡƒΡ‡ΡˆΠ΅Π½ΠΈΡ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΎΠ² ΠΏΡ€ΠΈ Ρ€Π°Π±ΠΎΡ‚Π΅ с Π²Π΅ΠΊΡ‚ΠΎΡ€Π½Ρ‹ΠΌ прСдставлСниСм слов Π² части опрСдСлСния сСмантичСских ΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΠΉ для русского языка. Π’ экспСримСнтах ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ обучСнная Π½Π° корпусах Ѐлибусты модСль word2vec ΠΈ Ρ€Π°Π·ΠΌΠ΅Ρ‡Π΅Π½Π½Ρ‹Π΅ Π΄Π°Π½Π½Ρ‹Π΅ Викисловаря Π² качСствС ΠΎΠ±Ρ€Π°Π·Ρ†ΠΎΠ²Ρ‹Ρ… ΠΏΡ€ΠΈΠΌΠ΅Ρ€ΠΎΠ², Π² ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… ΠΎΡ‚Ρ€Π°ΠΆΠ΅Π½Ρ‹ сСмантичСскиС ΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΡ. БСмантичСски связанныС слова (ΠΈΠ»ΠΈ Ρ‚Π΅Ρ€ΠΌΠΈΠ½Ρ‹) нашли своС ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ Π² тСзаурусах, онтологиях, ΠΈΠ½Ρ‚Π΅Π»Π»Π΅ΠΊΡ‚ΡƒΠ°Π»ΡŒΠ½Ρ‹Ρ… систСмах для ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ СстСствСнного языка

    Word Embedding for Semantically Relative Words: an Experimental Study

    No full text
    The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted as similar words. It allows to establish semantic relations (synonymy, relations of hypernymy and hyponymy and other semantic relations) by applying an automatic extraction. The extraction of semantic relations by hand is considered as a time-consuming and biased task, requiring a large amount of time and some help of experts. Unfortunately, the word2vec model provides an associative list of words which does not consist of relative words only. In this paper, we show some additional criteria that may be applicable to solve this problem. Observations and experiments with well-known characteristics, such as word frequency, a position in an associative list, might be useful for improving results for the task of extraction of semantic relations for the Russian language by using word embedding. In the experiments, the word2vec model trained on the Flibusta and pairs from Wiktionary are used as examples with semantic relationships. Semantically related words are applicable to thesauri, ontologies and intelligent systems for natural language processing

    8th Russian Summer School in Information Retrieval (RuSSIR 2014)

    No full text
    corecore